Article

Differentiating data- and text-mining terminology

Authors:
Jan H. Kroeze

Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002

Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002
View Profile

,
Machdel C. Matthee

Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002

Department of Informatics, School of IT, University of Pretoria, Pretoria, 0002
View Profile

,
Theo J. D. Bothma

Department of Information Science, School of IT, University of Pretoria, Pretoria, 0002

Department of Information Science, School of IT, University of Pretoria, Pretoria, 0002
View Profile

SAICSIT '03: Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technologySeptember 2003Pages 93–101

Published:17 September 2003Publication History

SAICSIT '03: Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology

Pages 93–101

ABSTRACT

When a new discipline emerges it usually takes some time and lots of academic discussion before concepts and terms get standardised. Such a new discipline is text mining. In a groundbreaking paper, <i>Untangling text data mining</i>, Hearst [1999] tackled the problem of clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorisation of data- and text-mining techniques. The essay is a conceptual study. A short overview of the problems regarding text-mining concepts is given. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge.

References

ALBRECHT, R. AND MERKL, D. 1998. Knowledge discovery in literature data bases. In Library and information services in astronomy III. (ASP conference series, vol. 153.) http://www.stsci.edu/stsci/meetings/lisa3/albrechtrl.html.]]Google Scholar
BERSON, A. AND SMITH, S.J. 1997. Data warehousing, data mining, and OLAP. McGraw-Hill, New York, NY.]] Google ScholarDigital Library
BIGGS, M. 2000. Resurgent text-mining technology can greatly increase your firm's 'intelligence' factor. InfoWorld 11(2), 52.]]Google Scholar
CHEN, H. 2001. Knowledge management systems: a text mining perspective. University of Arizona (Knowledge Computing Corporation), Tucson, Arizona.]]Google Scholar
CORNFORD, T. AND SMITHSON, S. 1996. Project research in information systems: a student's guide. Macmillan, Houndmills. (Information system series.)]]Google Scholar
HALLIMAN, C. 2001. Business intelligence using smart techniques: environmental scanning using text mining and competitor analysis using scenarios and manual simulation. Information Uncover, Houston, TA.]]Google Scholar
HAN, J. AND KAMBER, M. 2001. Data mining: concepts and techniques. Morgan Kaufmann, San Francisco, CA.]] Google ScholarDigital Library
HEARST, M.A. 1999. Untangling text data mining. In Proceedings of ACL'99: the 37th annual meeting of the association for computational linguistics, University of Maryland, June 20-26 (invited paper). http://www.ai.mit.edu/people/jimmylin/papers/Hearst99a.pdf.]] Google ScholarDigital Library
HOVY, E. AND LIN, C.Y. 1999. Automated text summarization in SUMMARIST. In Advances in automated text summarization. I. MANI AND M.T. MAYBURY, Eds. MIT Press, MA, 81-94. http://www.isi.edu/~cyl/.]] Google ScholarDigital Library
KONTOS, J., MALAGARDI, I., ALEXANDRIS, C. AND BOULIGARAKI, M. 2000. Greek verb semantic processing for stock market text mining. In Proceedings of natural language processing: 2nd international conference, Patras, Greece, June 2000, D.N. CHRISTODOULAKIS, Ed. Springer, Berlin, 395-405. (Lecture notes in artificial intelligence, no. 1835.)]] Google ScholarDigital Library
LUCAS, M. 1999/2000. Mining in textual mountains, an interview with Marti Hearst. Mappa Mundi Magazine, Trip-M, 005, 1-3. http://mappa.mundi.net/trip-m/hearst/.]]Google Scholar
MACK, R. AND HEHENBERGER, M. 2002. Text-based knowledge discovery: search and mining of life-science documents. Drug discovery today 7(11) (Suppl.), S89-S98.]]Google Scholar
NASUKAWA, T. AND NAGANO, T. 2001. Text analysis and knowledge mining system. IBM Systems journal 40(4), 967-984.]] Google ScholarDigital Library
NEW ZEALAND DIGITAL LIBRARY, UNIVERSITY OF WAIKATO. 2002. Text mining. http://www.cs.waikato.ac.nz/~nzdl/textmining/.]]Google Scholar
PERRIN, P. AND PETRY, F.E. 2003. Extraction and representation of contextual information for knowledge discovery in texts. Information sciences 151, 125-152.]] Google ScholarDigital Library
PONELIS, S. AND FAIRER-WESSELS, F.A. 1998. Knowledge management: a literature overview. South African journal of library and information science 66(1), 1-9.]]Google Scholar
RAJMAN, M. AND BESANÇON, R. 1998. Text mining: natural language techniques and text mining applications. In Data mining and reverse engineering: searching for semantics, S. SPACCAPIETRA AND F. MARYANSKI, Eds. Chapmann and Hall, London, 50-64.]]Google Scholar
ROB, P. AND CORONEL, C. 2002. Database systems: design, implementation, and management, 5th ed. Course Technology, Boston, MA.]] Google ScholarDigital Library
STAIR, R.M. AND REYNOLDS, G.W. 2001. Principles of information systems: a managerial approach, 5th ed. Course Technology, Boston, MA.]] Google ScholarDigital Library
SULLIVAN, D. 2000. The need for text mining in business intelligence. DM Review, Dec. 2000. http://www.dmreview.com/master.cfm.]]Google Scholar
SULLIVAN, D. 2001. Document warehousing and text mining: techniques for improving business operations, marketing, and sales. John Wiley, New York, NY.]] Google ScholarDigital Library
THURAISINGHAM, B. 1999. Data mining: technologies, techniques, tools, and trends. CRC Press, Boca Raton, Florida.]] Google ScholarDigital Library
WESTPHAL, C.R. AND BLAXTON, T. 1998. Data mining solutions: methods and tools for solving real-world problems. Wiley, New York, NY.]] Google ScholarDigital Library
ZORN, P., EMANOIL, M., MARSHALL, L. AND PANEK, M. 1999. Mining meets the web. Online 23(5), 17-28.]]Google Scholar

Index Terms

Recommendations

Knowledge Discovery in Text Mining Technique Using Association Rules Extraction
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks

This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association ...
Read More
Mining Text Using Keyword Distributions

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work ...
Read More
Generating association graphs of non-cooccurring text objects using transitive methods
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

In this paper we discuss text data mining (TDM) mainly in the context of the biomedical domain, where we extract associations from MEDLINE text articles and construct association graphs. We explore two techniques, the co-occurrence method and transitive ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAICSIT '03: Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology
September 2003
319 pages
ISBN:1581137745
Editors:
Jarr Eloff
Univ of Pretoria, Pretoria
,
Andries Engelbrecht
Univ of Pretoria, Pretoria
,
Paula Kotzé
Univ. of South Africa, Pretoria
,
Mariki Eloff
Univ. of South Africa, Pretoria
Sponsors
In-Cooperation
Publisher
South African Institute for Computer Scientists and Information Technologists
South Africa
Publication History
- Published: 17 September 2003
Check for updates
Author Tags
IR
KDD
TDM
algorithms
database queries
documentation
full-text retrieval
information retrieval
knowledge creation
knowledge discovery
knowledge management
languages
measurement
metadata
text data mining
text mining
text-mining
theory
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate187of439submissions,43%
Upcoming Conference
HT '24

Sponsor:

sigweb

35th ACM Conference on Hypertext and Social Media

September 10 - 13, 2024

Poznan , Poland
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 4,642
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Differentiating data- and text-mining terminology

SAICSIT '03: Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Knowledge Discovery in Text Mining Technique Using Association Rules Extraction

Mining Text Using Keyword Distributions

Generating association graphs of non-cooccurring text objects using transitive methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Differentiating data- and text-mining terminology

SAICSIT '03: Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Knowledge Discovery in Text Mining Technique Using Association Rules Extraction

Mining Text Using Keyword Distributions

Generating association graphs of non-cooccurring text objects using transitive methods

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media